Fix(reasoning_gym/games/countdown): Resolve SymPy parsing conflict for 10+ input numbers #514
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi!
Thanks for your work! I really appreciate your effort in building a unified benchmark. This PR addresses a bug in the
reasoning_gym/games/countdownmodule where the SymPy symbol replacement logic fails when the number of input numbers (num_terms) is 10 or greater.🐛 Problem
The current implementation for generating and validating the arithmetic expression breaks down when there are more than 9 unique symbols (representing the input numbers). This causes the expression string to contain incorrect, phantom numbers, leading to an
AssertionErrorduring the final answer validation.The error occurs because the original symbol naming scheme creates ambiguity when the indices transition from single to double digits.
🐞 Root Cause
The original symbol generation used a format like
syms = symbols(f"x:{num_terms}"), resulting in symbols likex0, x1, ..., x9, x10, x11, ....During the subsequent string replacement to substitute the symbol names (
x10,x11, etc.) with the actual numbers, the string"10"within"x10"was mistakenly matched and replaced before the full"x10"symbol, causing a collision.Example Trace:
In an example with 15 numbers:
Symbols
x10throughx14were incorrectly processed, leading to the unexpected presence of numbers like-2510,-2511, etc. (which are not in the available set) in the final expression string:This expression is obviously incorrect. The symbol generation is corrected to explicitly include an underscore separator, ensuring that each symbol name is unique and unambiguous during string replacement, regardless of the number of terms.
✅ Fix
The line is updated from:
to
The following logic for
expr_strgeneration has been updated accordingly to handle the new x_N format.With this fix, the expression is generated correctly using only the valid input numbers:
And the answer is 4109, which is correct.